Onwards!
In this report, we focus on the initial set of 99,900 validators controlled by the Ethereum Foundation and the client teams. This report was compiled with data until epoch 2600 (2020-11-30 01:20:00).
We have roughly equal distribution of clients in the network at genesis. The EF operates around 20% of each validator set associated with each client, while the remaining validators are maintained by the team behind the client itself.
In the following, statistics are obtained over all EF- and client teams-controlled validators, unless otherwise noted. In particular, we do not inclde data from validators activated after genesis or validators who are not controlled by the EF or the client teams.
We observe a lot more incorrect head attestations when the attestation is made for the starting slot of a new epoch. We name slot_index the index of the slot in the epoch (from 0 to 31).
Attesters get the head wrong whenever the block they are supposed to attest for is late, and comes much after the attestation was published. We can check which clients are producing these late blocks.
Since these late blocks seem to happen more often at the start of an epoch than at the end, it is quite clear that epoch processing is at fault, with some clients likely spending more time processing the epoch and unable to publish the block on time.
We can also check over time how the performance of validators on blocks at slot index 0 evolves, again plotting per client who is expected to produce the block at slot index 0.
Validators attesting on Teku-expected blocks at slot index 0 performed better at a time when the chain experienced difficulty and the number of block produced was lower, around epochs 200 to 300, which lines up with the suggested explanation of long epoch processing times.
In the plots below, we align on the y-axis validators activated at genesis. A point on the plot is coloured in green when the validator has managed to get their attestation included for the epoch given on the x-axis. Otherwise, the point is coloured in red. Note that we do not check for the correctness of the attestation, merely its presence in some block of the beacon chain.
The plots allow us to check when a particular client is experiencing issues, at which point some share of validators of that client will be unable to publish their attestations.
A block can include at most 128 aggregate attestations. How many aggregate attestations did each client include on average?
Smaller blocks lead to healthier network, as long as they do not leave attestations aside. We check how each client manages redundancy in the next sections.
Myopic redundant aggregates were already published, with the same attesting indices, in a previous block.
Subset aggregates are aggregates included in a block which are fully covered by another aggregate included in the same block. Namely, when aggregate 1 has attesting indices \(I\) and aggregate 2 has attesting indices \(J\), aggregate 1 is a subset aggregate when \(I \subset J\).
Lighthouse and Nimbus both score a perfect 0.
We first look at the reward rates per client since genesis.
Ethereum foundation-controlled clients are hosted on AWS nodes scattered across four regions in roughly equal proportions. We look at the reward rates per region.
Performing an omnibus test to detect significant difference between any of the four groups, we are unable to find such significance at epoch 800. Not long after, an experiment was performed which we describe in the next section. Before doing so, we investigate reward rates per client for validators controlled by the client team.
While we presented reward rates for all validators per client above, our results may have involved several competing effects. On the one hand, for each client, 20% of all validators are controlled by the EF. All validators controlled by the EF run on the same hardware for the first 1000 epochs or so (more on this in the next section). While this setting allows us to compare the performance of all clients in a controlled environment, we also expect the client teams behind the development of their client to have better knowledge of the hardware requirements of their software. Thus in the following we present two analyses: first the reward rates of all validators controlled by the EF, per client; second, the reward rates of validators controlled by the client teams.
Around epoch 1020, nodes controlled by the EF in regions 1 and 2 were scaled down from t3.xlarge units (4 CPUs 16GB memory, with unlimited CPU burst) to m5.large units (2 CPUs, 8GB memory, no burst). We observe a significant loss of performance despite continuous uptime.
Large decreases in all plots below for regions 1 and 2 indicate when nodes were stopped and restarted, circa epochs 1000 for region 1 and epoch 1025 for region 2. When we compare the performance of validators before and after the scaling down of regions 1 and 2, we use epoch 900 as control and epoch 1300 as treatment.
Reward rates per client are affected in roughly equal proportions.
We explore further the difference between clients in regions 1 and 2 and in regions 3 and 4.
It seems that Teku is responsible for most of the reward decrease in regions 1 and 2. Prysm registers a significant, albeit small, decrease in reward rates between the two region groups too.
We look at four metrics across each region:
To obtain a time series, we divide the period between epoch 800 and epoch 1400 in chunks of size 50 epochs. For each validator, we record how many included attestations appear in the dataset (ranging between 0 and 50 for each chunk), the number of correct targets, correct heads and its average inclusion delay. We average over all validators in the EF-controlled set, measuring metrics either per client or per region.
We start by looking at the metrics per region.
Inclusion, target and head correctness all present insignificant differences between the two groups of regions 1 and 2 and regions 3 and 4. However, we observe an increase in the average inclusion delay, which should explain the decreased reward rates for validators in regions 1 and 2.
Teku validators log a higher inclusion delay than others after the switch to smaller containers, as well as worse performance on other duties.